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Abstract 

A  new  iterative  algorithm,  the  multi-level  algorithm,  for  the  numerical  solution  of  steady 
state  Markov  chains  is  presented.  The  method  utilizes  a  set  of  recursively  ^jarsened  rep¬ 
resentations  of  the  original  system  to  achieve  accelerated  convergence.  It  is  motivated  by 
multigrid  methods,  which  are  widely  used  for  fast  solution  of  partial  differential  equations. 
Initial  results  of  numerical  experiments  are  reported,  showing  significant  reductions  in  com¬ 
putation  time,  often  an  order  of  magnitude  or  more,  relative  to  the  Gauss-Seidel  and  optimal 
SOR  algorithms  for  a  variety  of  test  problems.  The  paper  also  contrasts  and  compares  the 
multi-level  method  with  the  iterative  aggregation-disaggregation  algorithm  of  Takahashi. 
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Markov  systems  generated  by  computer  modeling  tools  such  as  queueing  network,  Petri  nets, 
or  reliability  modeling  packages  may  contain  hundreds  of  thousands  of  states.  The  resulting 
sparse  linear  systems  of  equations  have  a  correspondingly  large  number  of  unknowns  and  must, 
in  general,  be  solved  numerically  using  an  iterative  scheme.  Typical  methods  are  the  Power, 
Gauss-Seidel,  and  SOR  methods.  All  of  these  methods  have  the  drawback  that  they  may  require 
many  iterations  to  reach  a  solution,  particularly  if  the  system  is  large,  or  a  if  high  degree  of 
accuracy  is  required.  This  can  lead  to  unacceptably  long  computation  times. 

A  similar  situation  is  found  when  solving  partial  differential  equations,  where  systems  of 
many  hundreds  of  thousands  of  unknowns  are  not  uncommon.  Here,  however,  a  relatively 
new  algorithm  -  the  multigrid  method  -  has  met  with  considerable  success,  achieving,  under 
appropriate  conditions,  substantially  improved  solution  speeds  compared  to  traditional  methods 
such  as  Gauss-Seidel  and  SOR.  One  should  more  accurately  consider  multigrid  to  be  a  class  of 
methods,  as  the  basic  framework  allows  a  wide  variety  of  choices  for  each  of  the  constituent 
components.  The  method  to  be  presented  in  this  paper  is  in  many  respects  related  to  this 
class  of  algorithms:  the  Markov  system  is  recursively  coarsened  and  values  obtained  from  these 
smaller  systems  are  used  to  achieve  faster  convergence. 

In  this  paper  we  present  a  new  solution  algorithm,  the  Multi-Level  algorithm,  for  Markov 
chains.  Our  initial  experiments  indicate  the  multi-level  algorithm  does  not  require  the  Markov 
chain  to  have  any  special  structure  in  order  to  achieve  excellent  performance,  although  if  such 
structure  does  exist  it  may  be  possible  to  exploit  that  structure  to  achieve  even  better  perfor¬ 
mance.  We  can  offer  no  proof  of  convergence,  but  in  all  experiments  we  have  run  the  method 
always  converges.  The  convergence  theory  for  multigrid  methods  is  relatively  limited,  applying 
largely  to  equations  of  elliptic  type.  However  the  methods  are  widely  used  on  all  classes  of  linear 
and  non-linear  PDEs.  We  present  experimental  results  for  the  Gauss-Seidel,  SOR,  and  Multi- 
Level  algorithms  when  applied  to  Markov  chains  generated  from  birth-death  processes,  finite 
population  tandem  queueing  networks,  blocking  (finite  capacity)  tandem  queueing  networks, 
and  a  canonical  stochastic  Petri-net  model. 

For  purposes  of  brevity  we  will  refer  to  the  Gauss-Seidel  algorithm  as  the  GS  algorithm,  and 
the  Multi-Level  algorithm  as  the  ML  algorithm.  Note  that  the  phrase  "multi-level  algorithm”  is 
also  used  to  denote  a  class  of  methods  related  to  multigrid.  These  bear  a  structural  resemblance 


to  the  scheme  presented  here  in  that  they  make  use  of  coarser  subproblems  to  achieve  acceler¬ 
ated  solution;  they  are,  however,  otherwise  unrelated.  The  remainder  of  the  paper  is  structured 
as  follows.  In  the  following  section,  after  some  preliminary  remarks,  the  multi-level  method  is 
described.  In  section  3  we  describe  related  work  and  compare  and  contrast  the  multi-level  algo¬ 
rithm  with  existing  algorithms.  In  section  4  results  of  experiments  comparing  the  performance 
of  the  method  to  GS,  SOR,  and  the  algorithms  of  Takahashi  for  a  variety  of  test  problems  are 
presented.  In  section  5  we  discuss  the  practical  aspects  of  implementation  of  the  multi-level 
method  and  provide  a  list  of  possible  directions  for  future  research.  In  the  final  section  we 
summarize  the  paper. 

2.  Multi-Level  Solution  Algorithm 

In  this  section  we  explain  our  new  ML  solution  algorithm.  We  first  summarize  classical  multi- 
grid  techniques  in  section  2.1.  We  then  review  classical  Markov  chain  aggregation  techniques  in 
section  2.2.  In  section  2.3  we  give  a  detailed  description  of  our  ML  algorithm. 

2.1  Multigrid  Methods 

Multigrid  algorithms  are  a  relatively  recent  development  in  the  field  of  iterative  solvers  for 
large  systems  of  equations,  dating  from  the  late  seventies  [1].  They  were  originally  applied  to  the 
systems  of  equations  that  arise  from  the  discretization  of  elliptic  boundary  value  problems,  and  it 
is  for  these  equations  that  most  multigrid  theory  has  been  developed.  For  this  class  of  problems 
multigrid  algorithms  are  among  the  fastest  known  solvers,  being  of  optimal  complexity,  i.e. 
having  computation  times  that  are  linear  in  the  size  of  the  input.  An  introduction  to  multigrid 
algorithms  may  be  found  in  [2],  [7]  and  [17]. 

Multigrid  algorithms  begin  by  defining  a  set  of  increasingly  coarse  representations  (grid  levels) 
of  the  original  problem,  each  of  which  has  only  a  fraction  of  the  number  of  degrees  of  freedom 
as  its  predecessor.  The  algorithm  uses  a  standard  iterative  procedure  such  as  GS  at  each  grid 
level  to  quickly  reduce  error  components  that  are  high  frequency  w.r.t.  that  level  ( smoothing ). 
A  smoothing  sweep  through  all  grid  levels  efficiently  reduces  errors  across  the  entire  spectrum. 
In  addition  to  the  smoother,  operators  are  required  that  transfer  information  from  a  fine  to  a 
neighboring  coarse  level  (restriction)  and  vice  versa  ( prolongation ). 


Multigrid  algorithms  work  most  efficiently  on  regularly  structured  elliptic  problems,  whose 


coefficients  vary  smoothly  between  neighboring  unknowns.  In  such  cases  they  can  achieve  an 
error  reduction  of  an  order  of  magnitude  per  iteration.  Conversely,  multigrid  algorithms  for 
problems  that  are  non-elliptic,  unstructured  or  have  rapidly  varying  coefficients  do  not  in  general 
perform  as  well  and  currently  represent  an  active  field  of  research.  Markov  chains  can  possess 
one  or  more  of  the  above  characteristics.  One  branch  of  multigrid  research  which  attempts  to 
deal  with  general  sparse  systems  is  known  as  algebraic  multigrid,  see  [11]. 

Given  an  appropriate  choice  of  smoother,  multigrid  algorithms  can  be  parallelized  with  a 
high  degree  of  efficiency,  see  [8]  for  a  recent  survey.  This  is,  of  course,  an  added  advantage  in 
the  context  of  modern  supercomputer  architectures  and  networked  workstations. 

The  algorithm  to  be  presented  in  section  2.3  may  be  viewed  as  a  multigrid-like  algorithm. 
However,  owing  to  the  absence  of  a  grid  structure  and  because  of  the  difference  in  approach 
to  multigrid  schemes,  we  will  refer  to  the  algorithm  more  abstractly  as  a  multi-level  algorithm. 
Because  of  the  similarities  between  the  algorithms,  we  nevertheless  expect  that  many  of  the 
established  multigrid  ideas  can  be  applied  to  the  Markov  chain  solver,  and  this  indeed  proves  to 
be  the  case. 

2.2  Aggregation  of  Markov  Chains 

Consider  a  steady  state  continuous  time  Markov  chain  consisting  of  n  states  sj  ...sn.  Denote 
the  vector  of  unknowns  by  p,  where  p,  is  the  probability  of  being  in  state  st  . 

We  then  have  to  solve  the  system  of  equations 

Pp  —  0  (1) 

with  the  additional  condition 

i>=i  (2) 

«=1 

Note,  equation  (1)  is  simply  a  reformulation  of  the  classic  continuous  time  Markov  chain 
equation: 

*Q  =  0  (3) 

where  P  is  the  transpose  of  the  generator  matrix  Q,  and  p  is  the  transpose  of  steady  state 
probability  vector  x.  Equations  (1)  and  (2)  form  a  sparse  linear  system  which  is  typically  solved 
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numerically  using  the  GS  or  SOR  algorithm.  These  schemes  suffer  the  drawback  of  needing  a 
large  number  of  iterations  when  n  is  large  or  when  a  high  degree  of  accuracy  is  required. 

A  coarser  representation  of  the  Markov  chain  described  by  P  may  be  obtained  by  aggregation. 
This  means  creating  a  new  Markov  chain  Q  with  the  vector  of  state  probabilities  <7,  each  of  whose 
N  states  5| . . .  Sn  is  derived  from  a  small  number  of  states  of  P.  Figure  1  illustrates  the  situation 
for  an  eight-state  birth/death  chain  (P),  where  states  are  aggregated  in  pairs  to  form  a  four- 
state  coarser  level  system  (Q),  which  in  turn  is  pairwise  aggregated  to  form  the  coarsest  level 
two-state  system  (R). 

In  the  following  we  will  use  the  terms  fine  level  and  coarse  level  to  refer  to  Markov  chains 
where  the  latter  is  obtained  by  aggregation  from  the  former.  The  relation  s*  €  Si  signifies  that 
the  fine  level  state  a*  is  mapped  by  the  aggregation  operation  to  the  coarse  level  state  St. 

The  matrix  Q  of  the  aggregated  system  may  be  chosen  as  follows  : 


•*€S. 


This  is  the  classical  aggregation  equation.  Note  that  the  matrix  Q  is  a  function  not  only  of  the 
fine  level  matrix  P,  but  also  of  the  fine  leyel  solution  vector  p. 

This  yields  the  aggregated  equation  in  the  unknown  q : 

Qq  =  0  ,  (5) 
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with  the  additional  condition 


N 

=  i  . 

It  can  then  be  shown  that 

9.  =  £  Pk  • 


(6) 

(7) 


i.e.  the  solution  q  of  the  aggregated  system  truly  represents  a  coarser  version  of  the  solution  p 
of  the  original  problem.  The  probability  of  being  in  state  $  is  the  sum  of  the  probabilities  of 
being  in  any  of  its  constituent  fine-level  states. 


A  further  relation  between  the  solutions  of  coarse  and  fine  systems  is  given  by  the  disaggre 
gation  equation: 

i=N  £  PlPlk 

(8) 


tart 


E  pi 


This  equation  reduces  to  (1)  when  (7)  is  applied. 


One  may  proceed  recursively  to  obtain  yet  further  coarsenings,  arriving  eventually  at  some 
coarsest  system  which  may  consist  of  only  two  states. 


The  idea  behind  the  ML  algorithm  is  to  use  approximations  obtained  on  coarser  systems  to 
improve  the  current  approximation  to  the  fine  solution.  We  therefore  use  the  coarser  representa¬ 
tion  to  obtain  a  correction  to  p.  One  iteration  of  the  ML  algorithm  will  proceed  in  the  multigrid 
manner  from  the  original,  finest  system  down  to  the  coarsest,  setting  up  coarse  level  equations 
and  performing  GS  smoothing,  and  then  back  up  to  the  finest  level,  computing  and  applying 
corrections.  Note,  however,  that  other  orderings  of  processing  the  various  levels  are  possible. 

2.3  Description  of  the  Algorithm 

We  adopt  the  following  abbreviations  for  vectors  o,  b,  c  € 

a  =  b  +  c  s  a,  =  b,  *  c,,  1  <  *  <  m 

a  =  b/c  =  a,  =  bi/a,  1  <  *  <  m 

We  enter  the  («  +  l)th  iteration  of  the  ML  algorithm  with  the  current  approximation  to  the 
solution  p(n)  obtained  as  a  result  of  the  (n)th  iteration,  whereby  denotes  the  initial  guess, 
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and  begin  by  performing  one  or  more  sweeps  of  tbe  GS  algorithm,  obtaining  the  vector  p: 

P  =  G5(p<">)  .  (9) 

We  assume  throughout  that  application  of  GS  includes  a  subsequent  normalisation  step  to 
enforce  (2).  The  vector  p  will  not,  in  general  be  the  solution  p  of  (1),  but  we  may  write 

p*p*  =  p  ,  (10) 

where  p*  is  the  elementwise  multiplicative  correction  necessary  to  p.  Knowledge  of  p*  would 
immediately  enable  us  to  compute  the  solution  p.  We  may  write  (1)  as 

P{P*P*)  =  0  .  (11) 

Since  p  has  been  smoothed  by  application  of  the  GS  algorithm,  we  assume  that  it  no  longer 
contains  any  high  frequency  error  components,  and  thus  that  p*  is  smooth.  Therefore  we  may 
compute  an  approximation  to  p*  on  a  coarsened  system,  since  the  dimension  of  the  latter  is 
smaller,  and  thus  the  computation  will  be  cheaper.  We  write  a  coarsened  version  of  (11)  as 

Q(9*9*)  =  0  ,  (12) 

where  Q  is  the  matrix  of  the  aggregated  system  and  q*  and  q  represent  aggregated  representations 
of  p*  and  p,  respectively. 

The  coarse  system  matrix  Q  is  chosen  to  be  an  approximation  to  the  the  matrix  Q  from  (4), 
replacing  p  by  p,  since  p  is  not  available  until  the  algorithm  has  converged: 


In  the  case  of  a  converged  solution,  we  will,  however,  have  p  =  p  and  therefore  the  correct 
coarse  matrix  Q  =  Q. 

In  order  to  obtain  q  in  (12)  we  require  an  operator  that  maps  a  fine  level  vector  to  the 
coarser,  aggregated  vector.  This  operator  will  be  denoted  by  R  (from  the  multigrid  restriction 
operation),  and  we  write 

q  =  R(p)  •  (14) 
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We  choose  summation  for  R,  in  accordance  with  (7): 


9  =  R(p)  s  g,  =  52  Pk  • 

•k£Si 


(15) 


This  choice  for  R  has  the  property 


q  =  tf(p)  , 


i.e.  the  exact  fine  level  solution  is  mapped  by  R  to  the  exact  coarse  level  solution.  It  is  clear 
that  this  property  is  necessary,  since  at  convergence,  both  (1)  and  (5)  must  be  satisfied. 

We  proceed  by  defining 

9  =  9*9*  ,  (16) 
thus  obtaining  the  coarse  level  equations  to  be  solved: 


<39  =  0  , 


N 


= 1  • 


(17) 


Solving  (17)  for  q  will  therefore  enable  us  to  compute  9*,  the  coarse  approximation  to  the 
correction  via  (16):  q*  =  9/9. 

We  compute  the  fine-level  correction  from  its  coarse  approximation  using  the  operator  / 
(interpolation): 

P*  =  J(9*)  •  (18) 


We  choose  the  following  operation  for  /: 


P*  =  /(9*)  =  p^  =  9*  €  Si  . 


(19) 


We  then  compute  the  new  iterate  p<n+1)  using 

p(n+1>  =  p  =  C(p,p*)  =  p*p*  .  (20) 

If  the  algorithm  converges,  we  hope  that  p<n+1l  will  be  a  better  approximation  to  p  than  p^n\ 
In  general  we  have  p<n+1)  ^  p,  since  the  confection  p*  was  computed  only  approximately  on  a 
coarse  grid,  using  an  incorrect  matrix  Q  ^  Q. 
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By  analogy  with  the  SOR  scheme,  which  computes  an  over-corrected  iterate  compared  to  the 
underlying  GS  algorithm,  we  may  consider  using  an  overcorrection  for  the  ML  scheme: 

P*  =  /(?•)  =  P2  =  tf(«  +  (l-w)f?)  *€S<  ,  (21) 

where  we  set  0  <  u>  <  1.  For  u>  <  1  such  an  operation  will  overdo  the  correction,  since  values 
q*  <  1  will  be  decreased  and  values  of  q*  >  1  will  be  enlarged.  The  parameter  u  thus  plays  an 
analogous  role  to  that  of  the  over- relaxation  parameter  in  the  SOR  scheme.  It  is  to  be  hoped 
that,  as  is  the  case  for  the  SOR  scheme,  certain  choices  of  u  may  lead  to  improved  solution 
efficiency.  We  will  consider  the  utility  of  over-correction  in  section  4.2. 

The  relationship  between  the  correction  and  the  disaggregation  equation  can  be  seen  by 
comparing  the  r.h.s.  of  (8)  after  application  of  (14): 

i=N 

13  13  PlPlk 

i- 1  *i€Si 


and  after  the  correction  (20): 


13  ?«*  13  P‘Plk 


»= l  »i€5, 


= 

i=l  a,€Si 


After  the  iteration  has  converged,  we  have 


p*  =  ln 


where  1  denotes  the  vector  (1,  1,  . . .,  \)T,  i.e.  no  further  correction  takes  place.  We  then  also 
have  Q  =  Q  and  therefore  q  —  q. 

Note  that  the  question  of  assigning  fine  nodes  to  their  coarse  counterparts  is  still  open.  This 
we  will  call  the  coarsening  strategy  or  aggregation  strategy.  The  aggregation  strategy  can  have 
a  significant  effect  on  the  performance  of  the  ML  algorithm.  In  cases  where  we  have  some 
knowledge  of  the  structure  of  the  Markov  chain,  for  example  when  queueing  networks  are  to  be 
modelled,  then  we  may  utilize  this  information  in  the  construction  of  the  aggregated  system.  In 
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other  cases,  mapping  strongly  coupled  fine  states  to  the  same  aggregated  state  seems  to  be  an 
efficient  strategy. 

Note  that  the  successive  application  of  (18)  followed  by  (14)  fulfills 

R(I{q))  =  cq  c  €  »,  V  q  6  ,  (22) 

1. e.  a  coarse  vector  q  is  invariant  up  to  a  constant  factor  under  the  operation  of  prolongation 
followed  by  restriction.  An  analogous  relationship  between  the  prolongation  and  restriction 
operators  in  a  classical  multigrid  algorithm  is  frequently  required.  Furthermore  the  composite 
coarse  grid  correction  operator  C(p, /(q*))  preserves  the  relative  probabilities  of  all  fine  level 
states  mapped  to  the  same  coarse  level  state  by  aggregation: 

P  =  C(p, /(?*))=>  ?*-  =  s*,,s*3€S„  1  <i<N  . 

Pk2  Pkt 

The  two-level  version  of  the  ML  iteration  is  given  by  the  sequence  of  steps  (9),  (14),  (17),  (18), 
(20).  The  multilevel  algorithm  is  obtained  by  recursive  application  of  the  two-level  algorithm  to 
obtain  a  solution  to  the  aggregated  equation  (17)  and  is  described  in  algorithmic  form  in  Figure 

2.  We  use  the  subscript  /  to  denote  level  of  representation  (/  =  Imax  finest  level,  1  =  0  coarsest 
level).  The  coarse  level  /  —  1  and  fine  level  /  between  which  the  operators  /  and  R  map  are 
identified  by  appropriate  indices.  Note  that,  because  of  the  recursive  nature  of  the  algorithm, 
the  unknowns  q*,  q  and  q  are  represented  by  the  variables  p*_x ,  pi- 1  and  p/_i,  respectively. 

The  Multi-Level  algorithm  is  non-linear,  owing  to  the  use  of  the  coarsened  system  obtained 
via  (13),  although  the  original  problem  (1)  is  linear.  It  seems  therefore  unlikely  that  theoretical 
results  can  be  obtained  for  estimates  of  the  convergence  speed  of  the  algorithm  for  general 
problems. 

We  allow  in  general  the  possibility  of  applying  GS  v  times  at  each  level  with  v  >  1,  denoted 
by  GS".  We  also  consider  the  possibility  of  a  more  complex  cycle  form  (in  particular  F-  and 
W-cycles,  see  the  multigrid  literature),  obtained  by  multiple  recursive  calls  to  procedure  mss. 

3.  Related  Work 

Currently  most  performance  tools  requiring  the  solution  of  Markov  chains  use  either  the 
power,  GS,  or  SOR  algorithms.  In  a  paper  by  Stewart  and  Goyal  [15]  the  various  techniques  are 
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procedure  us(l) 
if  (/  =  0) 

solve  Pipi  =  0 

else 

pi  *  GS^Cp,) 

Pi- 1  *  Ri-i,i(Pi) 

maaO  -  1) 

P/_ i  *  Pl-i/Pi-i 
Pj  m  h-u(PU) 
Pi  ■  C(pufi) 
return 


Figure  2:  Multi-Level  Algorithm 

compared  and  SOR  with  dynamic  tuning  of  the  relaxation  parameter  emerges  as  the  method 
of  choice.  Initial  results  of  the  ML  algorithm  show  that  it  generally  outperforms  optimal  SOR, 
often  by  on  order  of  magnitude  or  more,  without  any  parameters  to  tune. 

Our  work  is  related  to  a  large  body  of  work  on  aggregation-disaggregation  techniques.  Most 

previous  work  using  aggregation  makes  the  assumption  that  the  number  of  aggregate  states,  N, 

is  much  less  than  the  number  of  states,  n,  i.e.  N  <  n.  In  our  algorithm  we  generally  assume 

N  =  - 
a- 

In  addition,  much  of  the  related  work  assumes  that  the  Markov  chains  being  solved  are 
generalized  birth-death  processes  [16,  14],  or  that  the  Markov  chains  are  nearly  completely 
decomposable  systems  [5,  6]  In  the  latter  case  the  solution  is  usually  an  approximation  often 
accompanied  by  bounds  on  the  error.  We  refer  the  reader  to  [12]  for  descriptions  of  these  special 
Markov  chain  structures  and  for  a  more  comprehensive  list  of  references.  Our  work  differs  in 
that  it  does  not  require  any  special  structure  in  the  Markov  chain,  and  the  result  is  exact,  not 
an  approximation. 

The  work  that  most  strongly  resemble  ours  is  the  algorithm  of  Takahashi  [16]  and  its  variants 
[12].  We  subsequently  use  the  terminology  derived  in  the  previous  section.  The  Takahashi 
algorithm  starts  with  an  initial  iterate  for  the  fine  level  chain.  The  fine  level  chain  of  n  states  is 
then  aggregated  into  a  coarse  Markov  chain  of  N  states,  where  N  <  n  using  equation  (13).  The 
coarse  chain  is  then  solved  (equation  (17)),  and  the  correction  obtained  from  the  coarse  chain 
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is  applied  to  the  previous  iterate  in  the  fine  level.  A  new  iterate  at  the  fine  level  is  obtained 
as  follows.  The  fine  level  states  are  grouped  together  into  N  sets  of  states  where  the  members 
of  each  set  correspond  to  each  aggregate  state  in  the  coarse  level.  The  set  of  linear  equations 
corresponding  to  each  of  these  sets  of  states  is  solved  independently  of  the  other  fine  level  states 
by  treating  the  values  for  the  other  states  as  constants.  The  new  iterate  at  the  fine  level  is  the 
result  of  solving  the  equations  for  each  set  of  states.  The  algorithm  iterates  between  the  two 
levels  until  sufficient  accuracy  is  obtained. 

We  may  view  the  Takahashi  algorithm  as  a  special  case  of  ML,  obtained  by  the  following 
choices,  compared  to  the  ML  scheme  we  prefer: 

1.  Use  of  only  two  levels  of  representation  of  the  system,  rather  than  multiply  coarsened 
problems. 

2.  N  <  n,  as  opposed  to  IV  =  ~  for  a  small  c  (typically  2  or  4). 

3.  Block  Gauss- Seidel  or  Block  Jacobi  as  a  smoothing  procedure  on  the  finer  level,  as  opposed 
to  a  small  number  of  steps  of  a  pointwise  Gauss-Seidel  scheme. 

4.  Use  of  post-  rather  than  pre-smoothing,  i.e.  processing  the  fine-level  after,  rather  than 
before  the  coarse-level  solvt. 

It  is  our  claim  that  1,  2  and  3  above  are  not  the  best  choices.  Point  4  proves  to  have  little 
effect,  as  is  also  the  case  for  classical  multigrid  methods,  and  is  in  fact  irrelevant  in  the  context 
of  performing  many  iterations  in  succession.  It  is  therefore  more  a  conceptual  difference.  In 
section  4.3  an  experimental  result  disproves  point  2  above.  In  point  3,  we  observe  that  solving 
entire  subsystems  of  size  n/N  can  be  prohibitively  expensive  and  contend  that  solutions  can 
be  more  efficiently  obtained  by  pointwise  GS  applied  on  a  larger  set  of  more  slowly  coarsened 
problems. 

We  restate  that  our  motivation  for  the  ML  algorithm  was  not  to  modify  current  aggregation- 
disaggregation  algorithms,  but  rather  to  devise  a  algorithm  similar  to  multi-grid  algorithms 
which  have  shown  exceptional  merit  in  solving  elliptic  boundary  value  problems.  We  find  it 
helpful  to  not  view  our  algorithm  as  a  Markov  chain  aggregation-disaggregation  variant,  but 
instead  view  it  as  a  multigrid-like  scheme. 
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4.  Experimental  Results 


In  this  section  we  present  experimental  results  to  show  how  our  new  algorithm  compares 
with  GS  and  SOR.  All  experiments  presented  assume  continuous  time  Markov  chains.  We 
have  also  solved  discrete  time  Markov  chains  with  similar  improvements  in  performance  relative 
to  SOR  and  GS.  In  section  4.1  we  first  compare  ML  to  GS  and  SOR  for  a  variety  of  test 
problems  assuming  an  unsophisticated  implementation  of  the  ML  algorithm.  In  section  4.2 
we  demonstrate  the  potential  of  improving  ML  performance  via  a  few  techniques  including 
intelligent  aggregation,  varying  the  number  of  smoothing  steps  at  each  level,  over-correction, 
and  F  and  W  cycles.  In  section  4.3  we  present  experimental  results  comparing  ML  to  both  the 
Takahashi  and  nested  Takahashi  algorithms  [16]. 

4.1  Generic  Multi-Level  Results 

In  this  section  we  present  our  generic  ML  results.  By  generic  we  mean  that  the  ML  algorithm 
used  is  the  simplest  one  possible,  a  V  cycle  (each  iteration  goes  from  the  finest  level  down  to 
the  coarsest  level  and  back  up  to  the  finest  level),  no  overcorrection,  only  applying  one  iteratijn 
of  the  smoother  (GS)  at  each  level,  and  a  simple  aggregation  strategy.  In  particular,  we  assume 
states  of  the  Markov  chain  are  ordered  0 . .  .n  —  1,  and  that  the  aggregation  policy  is  by  pairing 

neighboring  states  by  index:  a,-  €  5/  O  /  =  |^J  •  If  a  level  has  an  odd  number  of  states  then 

the  last  state  is  included  in  the  last  coarse  level  state.  Unlike  SOR,  this  ML  algorithm  has  no 
parameters  to  tune.  In  all  of  our  experiments  we  give  the  benefit  of  the  doubt  to  SOR  and  assume 

we  can  find  the  optimal  relaxation  parameter  u>.  We  find  u>  to  the  nearest  by  using  a 

binary  search  between  1.0  and  2.0.  This  results  in  over-optimistic  metrics  for  the  SOR  algorithm 
since  in  practice  u  must  be  found  via  dynamic  tuning,  thus  resulting  in  additional  iterations.  By 
presenting  best  case  results  for  SOR  and  worst  case  results  for  ML,  we  strengthen  our  argument 
that  ML  may  be  a  more  promising  solution  algorithm  than  SOR.  In  all  experiments  we  assume 
the  initial  iterate  is  set  to  the  vector  (£,  . . . ,  ^)T.  We  also  assume  the  ML  algorithm  recursively 
coarsens  the  set  of  equations  until  the  final  coarse  sytem  has  only  two  states. 

Possible  metrics  for  comparing  the  algorithms  are  the  number  of  iterations,  the  process  time, 
the  number  of  floating  point  operations  (flops),  and  the  geometric  mean  of  the  convergence  rate. 
The  number  of  flops  was  computed  by  inserting  a  counter  into  all  three  programs.  The  process 
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time  is  obtained  from  the  unix  system  call  timesQ  and  is  the  CPU  time  used  while  executing 
instructions  in  the  user  space  of  the  programs.  In  all  cases  we  have  found  the  process  time  to 
be  a  more  conservative  measure  than  flops,  i.e.  the  comparison  of  ML  to  GS  and  SOR  is  less 
favorable  when  using  process  times  than  when  using  flops.  Hence  to  strengthen  our  arguments 
of  the  utility  of  the  ML  algorithm  we  chose  process  time  as  our  primary  metric.  Note  that  the 
process  time  also  includes  time  for  generation  of  the  ML  structure,  whereas  the  flops  metric 
would  miss  this  factor.  In  addition,  the  flops  metric  does  not  capture  the  additional  pointer 
operations  needed  by  the  ML  algorithm  for  accessing  elements  in  the  coarse  levels.  To  enable 
quicker  comparison  of  the  algorithms  we  also  plot  the  ratio  of  process  time  for  SOR  and  GS 
relative  to  the  " :ocess  time  for  ML.  We  also  consider  the  number  of  iterations  necessary  for 
convergence.  In  general,  the  ML  algorithm  requires  far  fewer  iterations  than  the  other  two 
algorithms,  but  consumes  more  time  per  iteration  since  each  iteration  requires  one  application 
of  the  smoother  at  each  level  and  the  construction  of  the  coarse  level  matrices  Q. 

We  first  consider  a  birth-death  Markov  chain  with  a  finite  number  of  states.  We  vary  the  the 
number  of  states  assuming  a  birth  rate  of  1  and  a  death  rate  of  2.  The  results  are  presented 
in  figure  3.  The  number  of  iterations  increases  linearly  with  the  number  of  states  for  the  GS 
and  SOR  algorithms,  whereas  the  number  of  iterations  remains  fixed  at  21  iterations  for  the  ML 
algorithm.  Intuitively,  probability  mass  only  moves  slowly  through  the  system  in  the  GS  and 
SOR  algorithms,  whereas  one  iteration  of  the  ML  algorithm  can  move  mass  from  one  end  of  the 
chain  to  the  other  via  the  coarse  level  problems. 

The  number  of  flops  and  the  process  time  of  SOR  and  GS  increase  quadratically  for  SOR  and 
GS  with  the  number  of  states,  whereas  both  metrics  increase  only  linearly  for  ML.  Thus  ML 
is  an  optimal  method  for  this  particular  problem.  The  GS  (SOR)  algorithm  requires  257  (128) 
times  more  processing  time  than  ML  for  a  birth-death  chain  of  10,000  states.  The  ratios  increase 
with  system  size.  Even  for  small  birth  death  chains,  such  as  1,000  states,  the  ML  algorithm  is 
more  than  an  order  of  magnitude  faster  than  GS  and  SOR. 

One  possible  reason  that  the  GS  and  SOR  algorithms  require  more  time  as  the  number  of 
states  is  increased  is  that  they  must  move  probability  mass  a  longer  distance  in  the  solution 
vector.  To  determine  whether  it  is  this  distance  or  the  overall  number  of  states  we  conduct  the 
following  experiment.  We  assume  a  birth  death  chain  with  each  state  having  two  additional 
exiting  transitions  beyond  the  birth  and  death.  State  s,  has  a  transition  to  state  s,+m  and  a 
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transition  to  state  s,_m.  If  (i  +  m)  >  n  —  1  then  the  transition  is  to  state  n  -  1.  Similarly,  if 
(t  —  m)  <  0  then  the  transition  is  to  state  so.  We  initially  set  the  number  of  states  equal  to  500 
and  m  equal  to  2.  We  then  successively  double  the  number  of  states  and  the  distance  m.  We 
define  the  diameter  of  a  Markov  chain  to  be  the  maximum  distance  (or  number  of  transitions) 
between  any  two  states.  Hence,  in  this  experiment,  regardless  of  the  number  of  states,  the 
diameter  is  fixed  at  250.  We  assume  all  transitions  in  the  birth  direction  proceed  at  rate  1,  and 
all  transitions  in  the  death  direction  are  at  rate  2. 

The  results  from  this  experiment  are  presented  in  figure  4.  The  number  of  iterations  necessary 
for  SOR  and  GS  initially  rises  with  an  increase  in  the  number  of  states,  but  then  levels  off. 
Conversely,  the  process  time  steadily  increases,  hence  more  work  is  done  per  iteration  as  the  size 
increases.  It  appears  that  both  the  size  and  diameter  of  the  Markov  chain  influence  convergence 
speed  of  GS  and  SOR.  The  number  of  iterations  for  the  ML  algorithm  varies  from  19  to  25.  Thus 
the  diameter  of  the  chain  does  not  appear  to  have  much  effect  on  the  solution  speed  (measured 
in  iterations)  of  the  ML  algorithm. 

We  next  investigate  the  sensitivity  of  the  relative  performance  of  the  algorithms  to  the  ratio 
of  birth  rate  to  death  rate.  We  fix  the  birth  rate  at  1  and  vary  the  death  rate  from  0.001  to 
1000.  The  number  of  states  is  equal  to  10,000.  The  results  are  presented  in  figure  5.  Note  that 
both  axes  are  scaled  logarithmically. 

The  performance  of  the  GS  algorithm  is  always  worse  than  that  of  the  ML  algorithm.  The 
SOR  algorithm  performs  better  than  ML  when  the  death  rate  is  less  than  the  birth  rate,  but 
one  to  two  orders  of  magnitude  worse  when  the  roles  are  reversed.  In  the  best  case  observed 
(for  SOR),  SOR  requires  ^  of  the  process  time  of  ML.  Note  that  the  computation  times  for  GS 
and  SOR  could  be  made  symmetric  by  exchanging  the  birth  and  death  rates  or  by  reordering 
the  states.  The  ML  algorithm  does  not  require  any  such  special  techniques  to  achieve  good 
performance,  and  hence  is  more  resilient  to  changes  in  transition  rates. 

The  excellent  performance  of  SOR  when  the  death  rate  is  less  than  the  birth  rate  is  surprising. 
In  fact,  it  is  not  realistic.  For  a  ratio  of  |  or  less  SOR  converges  in  less  than  30  iterations.  We 
were  able  to  achieve  this  excellent  performance  by  first  determining  the  optimal  u>  to  the  nearest 
by  applying  SOR  many  times  with  u  values  chosen  in  a  binary  search.  In  practical 
situations  the  optima]  value  of  u>  is  not  known  a  priori  and  the  solution  will  be  calculated  only 
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once.  Hence  u  must  be  obtained  via  some  dynamic  procedure.  It  is  impractical  to  assume 
that  an  SOR  algorithm  including  dynamic  tuning  of  u>  can  converge  in  only  30  iterations.  In 
fact,  in  a  recent  paper  proposing  a  dynamic  method  for  determining  u,  [3]  ,  30  iterations  are 
executed  before  tuning  of  u>  even  begins.  The  paper  notes  that  often  thousands  of  iterations 
were  necessary  to  find  the  optimal  u>.  Thus,  the  excellent  performance  of  SOR  in  figure  5  could 
never  be  achieved  in  practice.  We  conjecture  that  ML  will  perform  better  than  any  existing 
dynamic  tuning  SOR  algorithm  for  the  entire  parameter  range  in  this  experiment. 

We  next  explore  Markov  chains  generated  from  simple  queueing  models.  We  first  assume  a 
closed  system  tandem  queue  model.  The  queueing  system  is  shown  in  figure  9.  We  assume  a 
finite  population  where  jobs  think  for  an  exponential  period  of  time  with  rate  A  (i.e.  the  jobs 
visit  an  infinite  server  and  are  served  at  rate  A).  Jobs  are  served  in  queues  1  and  2  at  rates 
/it  and  H2  respectively,  and  upon  leaving  queue  2  return  to  queue  1  with  probability  p.  The 
states  of  the  Markov  chain  are  generated  from  an  initial  state  of  all  jobs  in  the  think  state, 
and  then  by  constructing  the  chain  in  a  breadth-first  fashion.  States  are  numbered  0  through 
n  -  1  as  they  are  created  in  the  breadth  first  search.  The  aggregation  policy  used  in  the  ML 
algorithm  is  to  pair  adjacent-numbered  states.  Hence,  the  performance  of  the  ML  algorithm  is 
almost  certainly  sub-optimal,  since  no  intelligent  aggregation  techniques  are  being  used.  In  all 
experiments  reported  we  fix  the  think  rate  to  1.0. 

In  the  first  experiment  we  set  p,  the  probability  that  a  job  leaving  queue  2  returns  to  queue 
1,  to  0.0,  pi  to  1.0,  and  /t2  to  2.0  and  vary  the  population  from  25  to  250.  This  results  in  a 
range  of  351  to  31,626  states  in  the  underlying  Markov  chain.  The  results  of  the  experiment  are 
plotted  in  i  <ure  6.  The  ML  algorithm  requires  far  less  computation  time  for  the  solution  than 
the  other  algorithms,  especially  as  the  population  increases. 

We  next  consider  the  effect  of  the  relative  values  of  p\  and  /r2  on  the  performance  of  the 
algorithms.  We  fix  the  think  rate  to  1.0,  probability  p  to  0.0,  the  population  to  100  (resulting 
in  5151  states  in  the  underlying  Markov  chain),  set  p\  to  1.0,  and  vary  /x2  from  0.001  to  1000. 
The  results  are  plotted  in  figure  7.  The  ML  policy  results  in  lower  computation  times  than  the 
other  algorithms  across  the  entire  parameter  space,  and  when  p\  >  /t2  the  solution  time  is  an 
order  of  magnitude  faster.  Experiments  with  larger  populations  demonstrate  a  more  pronounced 
difference  in  the  performance  of  ML  relative  to  GS  and  SOR. 
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We  finally  consider  the  effect  of  probability  p,  i.  e.  the  probability  that  a  job  leaving  queue  2 
is  returned  to  queue  1.  We  set  the  think  rate  to  1.0,  the  population  to  150  (11,476  states),  and 
vary  p  for  three  different  settings  of  p\  and  pi-  The  performance  of  ML  is  much  less  sensitive 
to  the  ratio  of  the  rates  and  to  p  than  GS  or  SOR.  When  p\  =  pi  =  1.0,  GS  ranges  from  6  to 
12  times  slower  than  ML,  and  SOR  ranges  from  2  to  6  times  slower  than  ML.  When  /*]  =  1.0 
and  pi  =  2.0,  GS  ranges  from  6  to  12  times  slower  than  ML,  and  SOR  ranges  from  10  times 
slower  to  1.5  times  faster  than  ML.  When  p\  =  2.0  and  pi  =  1.0,  GS  ranges  from  5.3  to  2.2 
times  slower  than  ML,  and  SOR  ranges  from  2.7  times  slower  to  1.5  times  faster  than  ML. 

We  now  consider  the  application  of  the  three  algorithms  to  the  solution  of  the  underlying 
Markov  chain  of  a  canonical  stochastic  Petri  net.  Figure  1 1  shows  the  complexities,  measured 
in  KFLOPs,  of  the  GS,  SOR  and  ML  algorithms  applied  to  the  underlying  Markov  chain  of  the 
stochastic  Petri  net  of  Molloy  [9],  depicted  in  Figure  10,  with  populations  ranging  from  10  to 
60.  The  Markov  chains  for  this  problem  were  generated  using  the  SPNP  ( stochastic  Petri  net 
package)  tool  of  Ciardo  et  al  [4].  ML  is  superior  to  GS  for  all  cases  tested,  the  improvement 
being  a  factor  of  approximately  3.3  for  the  smallest  and  approximately  16.6  for  the  largest  case 
considered.  Against  SOR  with  an  optimally  chosen  overrelaxation  parameter,  ML  performs 
slightly  worse  only  for  the  smallest  problem  considered. 

4.2  Multi-Level  Acceleration  Techniques 

In  this  section  we  consider  a  few  techniques  that  can  be  applied  to  the  ML  algorithm  to 
further  accelerate  convergence  speed.  Some  of  these  techniques  (F  and  W  cycles,  application  of 
the  smoother  multiple  times  at  each  level)  are  borrowed  from  the  multigrid  literature.  There  are 
other  multigrid  techniques  that  may  r  rcve  to  be  useful  to  our  ML  algorithm.  The  over-correction 
idea  is  directly  analogous  to  over-relaxation  in  SOR. 

We  first  consider  how  more  intelligent  aggregation  of  the  Markov  chain  can  affect  the  per¬ 
formance  of  the  ML  algorithm.  We  consider  the  same  tandem  queueing  network  in  section  4.1, 
figure  9,  except  that  we  assume  finite  capacity  (blocking)  queues  with  a  capacity  of  c  and  set 
p  =  0.  We  assume  the  queues  are  finite  capacity  to  facilitate  the  ease  of  an  intelligent  aggre¬ 
gation  technique.  Figure  12  shows  on  the  left  the  Markov  chain  generated  by  this  modified 
tandem  queue.  The  states  of  the  Markov  cha  ^  he  written  as  a  two-dimensional  lattice  of 
size  (c  +  1)  x  (c  +  1).  The  transitions  then  form  a  regular  pattern,  somewhat  analogous  to  the 
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grid  of  a  discretized  PDG.  We  assume  the  states  to  be  numbered  lexicographically  from  top  left 
to  bottom  right  and  for  simplicity  that  c  + 1  be  a  power  of  two.  The  simple  aggregation  strategy 
used  would  successively  pair  states  that  are  adjacent  horizontally  until  no  longer  possible  and 
then  pairwise  vertically,  as  illustrated  on  the  upper  right.  An  alternative  strategy  more  appro¬ 
priate  to  the  structure  of  the  problem  is  shown  on  the  lower  right,  where  fine  level  states  are 
grouped  into  2x2  units. 

Table  1  shows  the  number  of  floating  point  operations  needed  by  the  ML  algorithm  applied  to 
the  Markov  chain  of  Figure  12.  We  compare  the  one-dimensional,  pairwise  aggregation  strategy 
( 1-D)  with  the  two-dimensional  case  (2-D).  We  consider  from  one  to  three  smoothing  steps  with 
GS  {u  =  1,  2,  3).  For  comparison,  GS  requires  30.1  MFLOPs  and  SOR  with  an  optimal  choice 
for  u>  requires  14.1  MFLOPs. 


1  -  D 

2 -D 

V 

1  2  3 

1  2  3 

MFLOPs 

12.7  9.3  8.2 

5.4  5.2  4.9 

Table  1:  Effect  of  v  and  aggregation  strategy  on  the  ML  algorithm. 

It  can  be  clearly  seen  that  the  two-dimensional  aggregation  strategy  is  significantly  faster 
than  the  simpler  scheme.  The  former  can  be  also  be  improved  (by  about  33%)  by  performing 
additional  relaxation  steps,  whereas  the  latter  algorithm,  which  is  already  superior,  improves  to 
a  lesser  extent.  In  this  experiment  the  best-case  ML  scheme  achieves  a  speedup  of  6.1  over  GS 
and  2.9  over  SOR. 


id 

ft&J 

BO 

0.8 

0.6 

0.5 

K221 

EO 

wim 

0.1 

EEf 

MFLOPs 

5.38 

5.21 

4.67 

4.14 

3.78 

3.60 

3.25 

3.25 

3.07 

2.89 

3.07 

Table  2:  Effect  of  overcorrection  on  operation  count. 

Table  2  shows  the  effect  of  the  "overcorrection”  according  to  (21)  on  the  operation  count 
of  the  ML  scheme  applied  to  the  same  problem,  where  u>  ranges  from  1.0  to  0.0.  A  significant 
improvement  is  found  to  be  achievable,  the  optimal  value  of  u  giving  rise  to  an  improvement  of 
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approximately  53%  over  the  unmodified  correction.  The  ML  scheme  with  the  optimal  overcor¬ 
rection  and  adapted  aggregation  strategy  thus  requires  less  than  1/10  (1/5)  of  the  floating  point 
operations  of  the  GS  (SOR)  algorithm. 

Figure  13  shows  the  operation  counts  of  the  Multi-Level  algorithm  applied  to  the  simple 
birth-death  chain  of  length  512  with  birth  and  death  rates  of  100  and  101  respectively.  We 
consider  the  V-,  F-  and  W-cycle  types  with  varying  degrees  of  overcorrection.  For  values  of 
u  =  1,  i.e.  without  overcorrection,  it  can  be  seen  that  both  W  and  F  cycles  are  superior,  despite 
their  higher  per  iteration  cost.  All  cycle  types  show  improvements  with  overcorrection,  the  V 
cycle  being  most  sensitive,  exhibiting  divergence  when  u  is  chosen  to  be  too  small.  This  behavior 
is,  of  course,  well  known  with  the  SOR  algorithm  where  a  slightly  overestimated  value  of  u>  can 
lead  to  fast  divergence.  The  V  cycle  is  improved  by  a  factor  of  16.8,  the  F  cycle  by  6.7  and 
the  W  cycle  by  a  factor  of  5.87.  All  ML  variants  are  approximately  equally  efficient  when  their 
optimal  oj  is  chosen. 

Figure  14  shows  comparable  results  for  SOR  applied  to  the  same  problem.  Here,  however,  a 
logarithmic  scale  is  used.  We  include  the  F  cycle  result  of  figure  13  for  comparison  purposes. 
From  the  figure  the  sensitivity  of  SOR  to  the  choice  of  u  is  clearly  demonstrated:  the  range  for 
which  fast  convergence  is  achieved  is  extremely  narrow.  By  comparison,  the  F  cycle  is  much 
more  robust,  i.e.  it  is  more  tolerant  of  inaccurate  values  of  w.  The  unmodified  ML  algorithm 
is  186  times  faster  than  unmodified  SOR  (i.e.  GS),  and  the  optimal  ML  algorithm  is  still  8.7 
times  faster  than  the  optimal  SOR. 

4.3  Comparison  with  Takahaahi’s  Algorithm 

In  this  section  we  present  results  from  initial  comparisons  with  the  iterative  aggregation-dis¬ 
aggregation  algorithms  of  Takahashi  [16]  as  specified  in  the  survey  paper  by  Schweitzer  [13]. 
As  outlined  in  section  3,  the  Takahashi  algorithm  aggregates  the  fine  level  chain  of  n  states 
into  a  coarse  Markov  chain  of  N  states.  We  define  the  aggregation  factor  to  be  the  number 
of  fine  level  states  aggregated  into  each  coarse  state.  We  consider  a  birth-death  chain  of  4096 
states  with  a  birth  rate  of  2.0  and  a  dea.h  rate  of  1.0.  We  solve  the  chain  using  the  Takahashi 
algorithm  with  aggregation  factors  of  2,  4,  8,  and  16.  We  could  not  get  the  algorithm  to 
converge  for  larger  aggregation  factors.  The  ratio  of  the  Takahashi  solution  time  over  the  ML 
solution  time  is  plotted  in  figure  15.  The  computation  time  for  ML  is  constant  since  we  fix  the 
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aggregation  factor  at  2  for  the  ML  algorithm.  In  another  experiment,  results  not  shown,  we 
have  observed  that  larger  aggregation  factors  increase  the  solution  time  for  the  ML  algorithm. 
The  performance  of  the  Takahashi  algorithm  is  extremely  sensitive  to  the  aggregation  factor, 
the  computation  times  varying  by  more  than  an  order  of  magnitude,  whereas  the  ML  algorithm 
requires  no  tuning  of  any  parameters.  Interestingly,  Takahashi’s  algorithm  performs  best  at 
a  modest  ratio  of  n  =  8 JV,  as  opposed  to  the  case  N  <  n,  as  the  algorithm  was  originally 
intended.  In  the  best  case,  the  Takahashi  algorithm  requires  a  factor  of  1.8  more  computation 
time.  Results  with  the  birth-death  rates  reversed  are  similar. 

We  also  attempted  to  test  the  nested  Takahashi  algorithm,  i.e.  the  coarse  chain  is  solved 
recursively  using  the  Takahashi  algorithm,  but  only  succeeded  in  getting  the  algorithm  to  con¬ 
verge  for  a  few  experiments.  The  problem  with  convergence  has  been  previously  noted  [13].  For 
those  few  experiments  that  convergence  was  achieved,  the  ML  had  a  smaller  computation  time. 

Hence,  it  appears  that  the  Takahashi  algorithm  could  be  used  to  achieve  solution  times  ap¬ 
proaching  that  of  the  ML  algorithm,  but  suffers  the  drawbacks  of  requiring  fine  tuning  of  the 
aggregation  factor,  and  also  convergence  problems.  The  Takahashi  algorithm  was  originally 
proposed  to  reduce  the  memory  requirements  for  solving  Markov  chains,  whereas  the  ML  algo¬ 
rithm  actually  uses  more  memory  than  standard  solution  techniques,  hence  it  may  not  be  fair 
to  compare  the  two  without  taking  memory  requirements  into  consideration. 

5.  Discussion  of  the  algorithm  and  the  results 

Memory  requirements.  The  implementation  of  the  ML  algorithm  requires  one  additional 
variable  at  each  state  compared  to  the  SOR  scheme  in  order  to  store  the  temporary  values  p.  In 
addition,  the  overall  number  of  states  needed  is  higher  than  for  SOR  because  of  the  additional 
recursively  aggregated  systems.  If  the  number  of  states  of  the  original  problem  is  n  and  this 
number  is  reduced  by  a  factor  of  /  during  each  aggregation  step,  then  the  overall  number  of 
states  s  needed  by  the  ML  algorithm  is  bounded  by 

n 

e<TTj  ■ 

Thus  in  the  examples  considered  in  section  4  we  have  /  =  1/2  and  therefore  s  <  2n.  Smaller 
values  of  /  lead  to  smaller  memory  requirements,  and  indeed  we  have  observed  improvements 
in  efficiency  with  /  =  1/4  and  /  =  1/8,  giving  s  <  4n/3  and  s  <  8n/7,  respectively. 


19 


Implementation  effort.  The  ML  algorithm  is  evidently  more  complex  than  the  SOR 
scheme,  additional  coding  being  required  for  the  the  treatment  of  the  coarse  grid  equations. 
The  implementations  used  in  section  4  required  however,  only  329  and  576  lines  of  C  for  the 
SOR  and  the  ML  algorithms  respectively.  Thus  we  consider  implementation  overhead  not  to  be 
significant. 

Parallelization.  The  Multi-Level  algorithm  will  parallelize  well,  given  an  appropriate  choice 
of  smoother.  Evidently,  there  are  serious  difficulties  involved  with  the  parallel  execution  of  the 
GS  and  SOR  algorithms,  owing  to  the  recursive  nature  of  these  schemes.  However,  in  the 
context  of  multigrid  algorithms,  it  has  been  shown  that  the  ” multi- color”  style  of  GS  can  allow 
efficient  parallelization  without  compromising  convergence  speed  [8].  Parallelization  of  ML  is 
done  by  data  partitioning  within  each  level.  With  a  multi-color  smoother,  all  the  operations  of 
the  ML  method  at  a  given  level  can  be  performed  concurrently.  Communication  will  be  required 
between  processors  for  the  smoothing  step  and  collect/broadcast  operations  for  the  convergence 
test  and  enforcement  of  (2).  Although  coarser  levels  will  run  less  efficiently,  as  less  computation 
is  performed  there,  the  coarse  granularity  of  the  finer  levels,  where  most  computational  work  is 
located,  will  ensure  overall  good  parallel  performance.  Thus  we  conclude  that  the  ML  algorithm, 
too,  will  perform  well  on  a  multiprocessor  system  or  workstation  cluster.  We  are  at  present 
developing  a  parallel  version  of  the  algorithm,  and  hope  to  report  on  the  results  in  the  near 
future. 

Cycle  types.  There  are  other  alternatives  to  processing  the  levels  of  aggregation  in  the 
downward-upward  sweep  used  in  the  present  scheme.  These  can  be  obtained  by  making  the 
number  of  recursive  calls  to  procedure  mss  in  Figure  2  a  function  of  the  level  number  l.  Thus 
coarser  levels  may  be  visited  more  than  once  during  one  iteration.  Such  techniques  can,  in  the 
classical  multigrid  context,  lead  to  improved  efficiencies,  as  the  coarse  level  equations  are  then 
more  accurately  solved.  Experiments  reported  in  the  previous  section  have  shown  that  this  can 
also  be  the  case  for  the  ML  algorithm  presented  here.  One  may  furthermore  consider  dynamic 
cycling  after  Brandt  [1]  or  adaptive  cycling  according  to  Rude  [10]. 

Choice  of  aggregate  equation.  The  algebraic  multigrid  algorithm  often  defines  the  matrix 
of  the  aggregated  level  as  a  function  of  the  finer  level  matrix  and  the  prolongation  and  restriction 
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operators  represented  in  matrix  form: 


Q  =  RPI  . 

In  the  present  context  this  would  have  the  advantage  of  not  having  to  recompute  the  matrix  Q 
at  every  iteration.  However,  property  (7)  would  be  lost. 

Choice  of  coarsening  strategy.  Convergence  characteristics  may  be  improved  by  a  ju¬ 
dicious  choice  of  aggregation  strategy.  In  particular,  Markov  chains  derived  from  queueing 
networks  can  possess  a  regular  structure  which  may  be  exploited.  Experiments  have  shown  this 
to  be  the  case  for  the  tandem  queue  example  of  section  3.  More  work  is  needed  to  create  a 
Multi-Level  algorithm  which  automatically  finds  good  aggregation  strategies. 

Additive  correction.  Multigrid  algorithms  use  the  coarser  grids  to  compute  an  additive 
correction,  rather  than  a  multiplicative  one.  Our  attempts  to  develop  such  a  algorithm  were  only 
partially  successful  in  that  occasionally,  negative  correction  values  from  the  coarse  grid  induced 
negative  values  in  the  solution  at  the  finer  level,  leading  to  a  divergent  iteration.  Similar  results 
were  observed  when,  by  analogy  with  multigrid  schemes,  a  defect  was  used  to  generate  an 
inhomogeneous  aggregated  equation.  Further  work  is  needed  to  determine  whether  a  multigrid¬ 
like  algorithm  can  be  found  for  the  Markov  problem. 

6.  Conclusions  and  Outlook 

The  ML  algorithm  presented  in  this  paper  has  been  shown  to  require  significantly  less  com¬ 
putation  time  than  the  SOR  scheme  for  a  number  of  test  problems.  The  difference  between  the 
two  algorithms  increases  with  the  number  of  states  of  the  Markov  chain.  In  addition  to  being 
significantly  faster,  based  on  our  experience  to  date  it  appears  that  the  ML  algorithm  is  much 
more  resilient  to  variations  in  transition  rates.  Another  nice  property  of  the  ML  algorithm  is 
that  excellent  performance  can  be  obtained  without  the  necessity  of  tuning  a  parameter,  such 
as  the  over-relaxation  parameter  in  SOR.  We  have  shown  that  ML  performance  can  be  even 
further  improved,  up  to  a  factor  of  6  times  faster  in  our  initial  experiments,  by  using  F  and  W 
cycles  or  over-correction. 

Examination  of  a  larger  sample  of  cases,  including  Markov  chains  from  real  applications  have 
yet  to  be  made.  However,  we  feel  that  the  algorithm  shows  enough  promise  to  justify  further 
investigation  into  ML  schemes  for  solving  Markov  chains. 
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Further  work  will  also  include  the  implementation  and  testing  of  the  techniques  mentioned 
in  the  previous  section.  In  particular,  attention  will  be  paid  to  choosing  an  aggregation  strategy 
for  general  Markov  chains,  where  no  a  priori  knowledge  of  the  topology  is  available.  Several  of 
these  techniques  have  already  proven  to  provide  improved  efficiency  in  preliminary  experiments. 

A  parallel  version  of  the  ML  algorithm  is  also  in  preparation,  and  we  hope  to  be  able  to 
present  results  obtained  on  a  MIMD  supercomputer  in  the  near  future. 
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Figure  3:  Birth-Death  chain.  Top  left  Number  of  Iterations;  Top  right:  Operation  count; 
Bottom  left  :  Computation  time;  Bottom  right  :  Time  Ratio. 
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Figure  8:  Tandem  queue  problem,  variation  of  p,  Computation  times.  Left  :  pi  —  p2  —  1; 
Centre  :  pi  =  1,  p2  =  2;  Right  :  pi  =  2,  p2  =  1. 


Figure  10:  Molloy’s  Problem 
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Figure  14:  Effect  of  overrelaxation/overcorrection  for  SOR  and  ML  algorithms.  Birth  (rate  100) 
/  death  (rate  101)  chain,  length  512. 


Figure  15:  Comparison  With  Takahashi  Algorithm,  Time  Ratio. 
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